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Abstract 

In this paper, we consider adaptive estimation of an unknown planar compact, convex set 
from noisy measurements of its support function on a uniform grid. Both the problem of 
estimating the support function at a point and that of estimating the convex set are studied. 
Data-driven adaptive estimators are proposed and their optimality properties are established. 

For pointwise estimation, it is shown that the estimator optimally adapts to every compact, 
convex set instead of a collection of large parameter spaces as in the conventional minimax theory 
in nonparametric estimation literature. For set estimation, the estimators adaptively achieve 
the optimal rate of convergence. In both these problems, our analysis makes no smoothness 
assumptions on the unknown sets. 

Keywords: Adaptive estimation, circle convexity, convex set, minimax rate of convergence, sup¬ 
port function. 
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1 Introduction 

We study in this paper the problem of nonparametric estimation of an unknown planar compact, 
convex set from noisy measurements of its support function. Before describing the details of the 
problem, let us first introduce the support function. For a compact, convex set K in its support 
function is defined by 


hxid) '■= max (xi cos 0-|- 3:2 sin 0) for 0 G M. 

{xi,X2)&K 

Note that hx is a periodic function with period 27r. It is useful to think about 6 in terms of 
the direction (cos 0, sin 0). The line xi cos 0-|- 2:2 sin0 = hx{0) is a support line for K (i.e., it 
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touches K and K lies on one side of it). Conversely, every support line of K is of this form 
for some 6. The convex set K is completely determined by the its support function Kk because 
^ ^ 2 ) : xi cos 9 + X 2 sin 9 < (0)}. 

The support function hx possesses the circle-convexity property (see, e.g., Vitale (1979)): for 
every cci > a > 0:2 and 0 < ai — 0:2 < tt. 


sin(ai — a) sin(Q: — 02 ) 


> 


sin(Q;i — 02 ) 


sin(ai — a) sin(a — 02 ) 


hxia) 


( 1 ) 


Moreover the above inequality characterizes hx, i.e., any periodic function of period 2tt satisfying 
the above inequality equals hx for a unique compact, convex subset K in The circle-convexity 
property (1) is clearly related to the usual convexity property. Indeed, if we replace the sine function 
in (1) by the identity function (i.e., if we replace since by a in (1)), we obtain the condition for 
convexity. In spite of this similarity, (1) is different from convexity as can be seen from the example 
of the function h{9) = | sin0| which satishes (1) but is clearly not convex. 


1.1 The Problem, Motivations, and Background 

We are now ready to describe the problem studied in this paper. Let K* be an unknown compact, 
convex set in We study the problem of estimating K* or hx* from noisy measurements of hx*- 
Specifically, we observe data (0i, Yi),..., (0^, Yn) drawn according to the model 

Yi = hx*{9i) + fori = l,...,n (2) 

where 0i,..., 0„ are fixed grid points in (—vr, vr] and ^ 1 ,..., are i.i.d Gaussian random variables 
with mean zero and known variance . We focus on the dual problems of estimating the scalar 
quantity hx*{9i) for each 1 < i < n as well as the convex set K*. We propose data-driven adaptive 
estimators and establish their optimality for both of these problems. 

The problem considered here has a range of applications in engineering. The regression model (2) 
was first proposed and studied by Prince and Willsky (1990) who were motivated by an application 
to Computed Tomography. Lele et al. (1992) showed how solutions to this problem can be applied to 
target reconstruction from resolved laser-radar measurements in the presence of registration errors. 
Gregor and Rannou (2002) considered application to Projection Magnetic Resonance Imaging. It 
is also a fundamental problem in the field of geometric tomography; see Gardner (2006). Another 
application domain where this problem might plausibly arise is robotic tactical sensing as has been 
suggested by Prince and Willsky (1990). Finally this is a very natural shape constrained estimation 
problem and would fit right into the recent literature on shape constrained estimation. See, for 
example, Groeneboom and Jongbloed (2014). 

Most proposed procedures for estimating K* in this setting are based on least squares mini¬ 
mization. The least squares estimator K\s is defined as any minimizer of X]^i(V ~ hK{9i))‘^ as K 
ranges over all compact convex sets. The minimizer in this optimization problem is not unique and 
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one can always take it to be a polytope. This estimator was first proposed by Prince and Willsky 
(1990) who also proposed an algorithm for computing it based on quadratic programming. Further 
algorithms for computing Kys were proposed in Gardner and Kiderlen (2009); Lele et al. (1992); 
Prince and Willsky (1990). 

The theoretical performance of the least squares estimator was first considered by Gardner et al. 
(2006) who mainly studied its accuracy for estimating K* under the natural hxed design loss: 

1 ” 2 

. (3) 

2=1 

The key result of Gardner et al. (2006) (specialized to the planar case that we are studying) states 
that Lf{K*,Kis) = 0(n“^/^) as re —>■ oo almost surely provided K* is contained in a ball of bounded 
radius. This result is complemented by the minimax lower bound in Guntuboyina (2011) where 
it was shown that is the minimax rate for this problem. These two results together imply 

minimax optimality of K\^ under the loss function Lj. No other theoretical results for this problem 
are available outside of those in Gardner et al. (2006) and Guntuboyina (2011). 

As a result, the following basic questions are still unanswered: 

1. For a fixed i G {I,-- - ,re}, how does one optimally and adaptively estimate hK*{0i)l This 

is the pointwise estimation problem. In the literature on shape constrained estimation, 
pointwise estimation has been the most studied problem. Several papers have been written 
on this for monotonicity constrained estimation; prominent examples being Brunk (1970); 
Garolan and Dykstra (1999); Gator (2011); Groeneboom (1983, 1985); Jankowski (2014); 
Wright (1981) and convexity constrained estimation; prominent ones being Gai and Low 
(2015); Groeneboom et al. (2001a,b); Hanson and Pledger (1976); Mammen (1991). For the 
problem considered in this paper however, nothing is known about pointwise estimation. It 
may be noted that the result Lf{K*,Kis) = of Gardner et al. (2006) does not say 

anything about the accuracy of hf> (Oi) as an estimator for hx*{0i). 

-^Is ^ 

2. How to construct minimax optimal estimators for the set K* that also adapt to polytopes? 
Polytopes with a small number of extreme points have a much simpler structure than general 
convex sets. In the problem of estimating convex sets under more standard observation models 
different from the one studied here, it is possible to construct estimators that converge at 
faster rates for polytopes compared to the overall minimax rate (see Brunei (2014) for a nice 
summary of this theory). Similar kinds of adaptation has been recently studied for shape 
constrained estimation problems based on monotonicity and convexity, see Baraud and Birge 
(2015); Ghatterjee et al. (2014); Guntuboyina and Sen (2013). Based on these results, it is 
natural to expect minimax estimators that adapt to polytopes in this problem. This has not 
been addressed previously. 
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1.2 Our Contributions 


We answer both the above questions in the affirmative in the present paper. The main contributions 
of this paper can be summarized in the following: 


1. We study the pointwise adaptive estimation problem in detail in the decision theoretic frame¬ 
work where the focus is on the performance at every function, instead of the maximum 
risk over a large parameter space. This framework, first introduced in Cai et al. (2013) and 
Cai and Low (2015) for shape constrained regression, provides a much more precise charac¬ 
terization of the performance of an estimator than the conventional minimax theory does. 

In the context of the present problem, the difficulty of estimating hK*{0i) at a given K* and 
Oi can be expressed by means of a benchmark Rn{K*,6) which is defined as follows (below 
Ei denotes expectation taken with respect to the joint distribution of Yi,..., W generated 
according to the model (2) with K* replaced by L): 

Rn{K* ,9) = sup inf max ^K*{h — hK*{0))‘^, EL(h —/il(0))^^ , (4) 


where the supremum above is taken over all compact, convex sets L while the inhmum is 
over all estimators h. In our first result for pointwise estimation, we establish, for each 
i £ {1,... ,n}, a lower bound for the performance of every estimator for estimating hK*{9i)- 
Specihcally, it is shown that 


Rn{K*,9i)>C- 


K{i) + 1 


(5) 


where fc*(i) is an integer for which an explicit formula can be given in terms of K* and i] and 
c is a universal positive constant. It will turn out that A;*(z) is related to the smoothness of 
{9) at 9 = 9i. 


We construct a data-driven estimator, hi, of hK*{9i) based on local smoothing together with 
an optimization scheme for automatically choosing a bandwidth, and show that the estimator 
hi satisfies 

[hi - hK*{9i)\ < C • ————- (6) 

for a universal positive constant C. Inequalities (5) and (6) together imply that hi is, within 
a universal constant factor, an optimal estimator of hK*{9i) for every compact, convex set 
K* . This optimality is stronger than the traditional minimax optimality usually employed in 
nonparametric function estimation. The quantity cr^/(A:*(i) -|- I) depends on the unknown set 
K* in a similar way that the Fisher information bound depends on the unknown parameter in 
a regular parametric model. In contrast, the optimal rate in the minimax paradigm is given 
in terms of the worse case performance over a large parameter space and does not depend on 
individual parameter values. 
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2. Using the optimal adaptive point estimators hi,... ,hn, we construct two set estimators K 
and K'. The details of this construction are given in Section 2.2. In Theorems 3.6 and 3.8, 
we prove that K is minimax optimal for K* under the loss function Lf while the estimator 
K' is minimax optimal under the integral squared loss function defined by 

L{K',K*):= r {h^,{e)-hK>{d)fde. (7) 

J —TT 

Specifically, Theorem 3.6 shows that 

+ j ( 8 ) 

provided K* is contained in a ball of radius R. This, combined with the minimax lower bound 
in Guntuboyina (2011), proves the minimax optimality of K. An analogous result is shown 
in Theorem 3.8 for ¥,k*L{K*,K'). For the pointwise estimation problem where the goal is 
to estimate hK*{0i), the optimal rate a‘^/{k^,{i) + 1) can be as large as However the 

bound (8) shows that the globally the risk is at most . The shape constraint given by 
convexity of K* ensures that the points where pointwise estimation rate is cannot be 

too many. Note that we make no smoothness assumptions for proving (8). 

3. We show that our set estimators K and K' adapt to polytopes with bounded number of 

extreme points. Already inequality (8) implies that KK*Lf{K*,K) is bounded from above 
by the parametric risk Ca^ jn provided R = 0 (note that R = 0 means that K* is a sin¬ 
gleton). Because cr^/n is much smaller than the bound (8) shows that K adapts to 

singletons. Theorem 3.7 extends this adaptation phenomenon to polytopes and we show that 
¥AK*Lf[K*,K) is bounded by the parametric rate (up to a logarithmic multiplicative factor 
of n) for all polytopes with bounded number of extreme points. An analogous result is also 
proved for ¥.k*L{K*,K') in Theorem 3.8. It should be noted that the construction of our 
estimators K and K' (described in Section 2.2) does not involve any special treatment for 
polytopes; yet the estimators automatically achieve faster rates for polytopes. 

We would like to stress two features of this paper: (a) we do not make any smoothness as¬ 
sumptions on the boundary of K* throughout the paper; in particular, note that we obtain the 
J2-4/5 £qj. gg£ estimators K and K' without any smoothness assumptions, and (b) we go 
beyond the traditional minimax paradigm by considering adaptive estimation in both the pointwise 
estimation problem and the problem of estimating the entire set K*. 

1.3 Organization of the Paper 

The rest of the paper is structured as follows. The proposed estimators are described in detail 
in Section 2. The theoretical properties of the estimators are analyzed in Section 3; Section 3.1 


5 



gives results for pointwise estimation while Section 3.2 deals with set estimators. In Section 4, 
we investigate optimal estimation of some special compact convex sets K* where we explicitly 
compute the associated rates of convergence. The proofs of the main results are given in Section 6 
and additional technical results are relegated to Appendix A. 


2 Estimation Procedures 

Recall the regression model (2), where we observe noisy measurements (0i, Yf),..., (9n,Yn) with 
9i = 27ri/n — tt, i = 1, ...,n being fixed grid points in (—vr, vr]. In this section, we hrst describe in 
detail our estimate hi for hK*{di) for each i. Subsequently, we shall describe how to put together 
these estimates hi,... ,hn to yield set estimators for K*. 


2.1 Estimators for hK*{di) for each fixed i 


Fix 1 < i < n. Our construction of the estimator hi for hK*{9i) is based on the key circle-convexity 
property (1) of the function hK*{-)- Let us define, for 0 < < 7r/2 and 9 G (—7r,7r], the following 

two quantities: 


i(9,^) 


cos (f> {hx* {9 + (j)) + hx* {9 - cj))) 


hx* {9 + 2(/>) + hx* {6 — 2(j)) 
2 


and 


u{9,4>) := 


hx* T </>) + hx* {9 — 4>) 


2 cos (f) 

The following lemma states that for every 9, the quantity hx*{9) is sandwiched between l{9,4>) 
and u{9, (p) for every cp. This will be used crucially in defining h. The proof of this lemma is a 
straightforward consequence of (1) and is given in Appendix A. 


Lemma 2.1. For every 0 < 0 < 7r/2 and every 9 G (—7r,7r], we have l{9,<p) < hx*{9) < u{9,(p). 


For a fixed 1 < i < n, Lemma 2.1 implies that l(9i,^^) < hx*{9i) < u{9i,^^) for every 
0 < j < [n/4j. Note that when j = 0, we have l{9i,0) = hx*{9i) = u(0j,O). Averaging these 
inequalities for j = 0,1,... , fe where /c is a fixed integer with 0 < fc < [n-/4j, we obtain 

Lk{9i) < hx*{9i) < Uk{9i) for every 0 < A: < [n/4j (9) 


where 


1=0 


Lk{9i) := ~ Uk{9i): = 


k + l 




1=0 


We are now ready to describe our estimator. Fix 1 < f < n. Inequality (9) says that the 
quantity of interest, hx*{9i), is sandwiched between Lk{9i) and Uk{9i) for every k. Both Lk{9i) 
and Uk{9i) can naturally be estimated by unbiased estimators. Indeed, let 

i{9i, 2j7r/n) := cos{2j'K/n){Yi+j + Yi_j) - 2j7r/n) := 2los(2:^"/n) 
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and take 


Lk{Oi) ■■=-j^^'^i{0u2j'K/n) and Uk{9i) :=-^^'^u{9i,2j-K/n) . (10) 

i=o i=o 

Obviously, in order for the above to be meaningful, we need to define h) even for i ^ {1,... ,re}. 
This is easily done in the following way; for any i G Z, let s be such that i — sn G {1,..., n} and 
take T) . 


As k increases, one averages more terms in (10) and hence the estimators Lk{9i) and tJk{9i) 
become more accurate. Let 


^k[9i) 


lJk{9i) - Lk{9i) 


^ — 

i=o ^ 


cos(4j7r/n) Yj+j + Yj-j 
cos(2j7r/n) 2 


( 11 ) 


Because of (9), a natural strategy for estimating hK*{9i) is to choose k for which Ak{9i) is the 
smallest and then use either Lk{9i) or Uk{9i) at that k as the estimator. This is essentially our 
estimator with one small difference in that we also take into account the noise present in Ak{9i). 
Formally, our estimator for hx*i9i) is given by: 

where k{i) := argmin | (^Afc(0i)^^ + (12) 

and I := {0} U {2^ :j>0 and 2^ < [n/16\}. 

Our estimator hi can be viewed as an angle-adjusted local averaging estimator. It is inspired by 
the estimator of Cai and Low (2015) for convex regression. The number of terms averaged equals 
k(i) + 1 and this is analogous to the bandwidth in kernel-based smoothing methods. Our k{i) is 
determined from an optimization scheme. Notice that unlike the least squares estimator {9i), 
the construction of hi for a fixed i does not depend on the construction of hj for j ^ i. 


2.2 Set Estimators for K* 

We next present estimators for the set K*. The point estimators hi,... ,hn do not directly give 
an estimator for K* because {hi,..., hn) is not necessarily a valid support vector i.e., {hi,..., hn) 
does not always belong to the following set: 

hi := {^{hK{9i),..., hK{9n)) : iL C is compact and convex} . 

To get a valid support vector from {hi,... ,hn), we need to project it onto hi to obtain: 

n 2 

h^ := {hf,...,h^) := argmin '^(hi-hi) (13) 

(/U,...,hn)eW ^ ^ 

The superscript P here stands for projection. An estimator for the set K* can now be constructed 
immediately from hf,... , h^ via 

K := I {xi,X 2 ) ■ xicos9i + X2sm9i < hf for alH = 1,...,n > . (14) 
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In Theorems 3.6 and 3.7, we prove upper bounds on the accuracy of K under the loss function Lf 
defined in (3). 


There is another reasonable way of constructing a set estimator for K* based on the point 
estimators hi,...,hn. We first interpolate hi,...,hn to define a function h' : (—vr,vr] ^ M as 


follows: 


h'{e) := 


sin(6>^+i -0) I sin(6* - Oj) ~ 
sin(6'i+i - 6i) * sin(6»i+i - 9i) 


for 9i <9 < 0i+i. 


(15) 


Here i ranges over 1,..., n with the convention that 9n+i = 9i + 2 tt (and 9n < 9 < 9^+1 should be 
identified with —vr < 9 < —vr + 27rln). Based on this function h', we can define our estimator K' 
of K* by 

k' := argmin J {h'{9) — hK{9)^ d9. (16) 


The existence and uniqueness of R' can be justified in the usual way by the Hilbert space projection 
theorem. In Theorem 3.8, we prove bounds on the accuracy of K' as an estimator for K* under 
the integral loss L dehned in (7). 


3 Main Results 

We investigate in this section the accuracy of the proposed point and set estimators. The proofs 
of these results are given in Section 6. 

3.1 Accuracy of the Point Estimator 

As mentioned in the introduction, we evaluate the performance of the point estimator hi at individ¬ 
ual functions, not the worst case over a large parameter space. This provides a much more precise 
characterization of the accuracy of the estimator. Let us first recall inequality (9) where hK*{9i) is 
sandwiched between Lk{9i) and Uk{9i). Define Ak{9i) := Uk{9i) — Lk{9i). 

Theorem 3.1. Fix i G {1,... ,n}. There exists a universal positive constant C sueh that the risk 
of hi as an estimator ofhK*{9i) satisfies the following inequality: 

Ex* (k - fiK* {Oi ))' < c • , ^ (17) 

V / kfii) -I-1 

where 

kfii) := argmin ( Afc(6»j) ) . (18) 

k£l V y/k + lj 

Remark 3.1. It turns out that the bound in (17) is linked to the level of smoothness of the function 
hx* at 9i. However for this interpretation to be correct, one needs to regard hx* as a function on 
instead of a subset of M. This is further explained in Remark 4.1. 







Theorem 3.1 gives an explicit bound on the risk of hi in terms of the quantity defined 

in (18). It is important to keep in mind that k^{i) depends on K* even though this is suppressed 
in the notation. In the next theorem, we show that /{k^{i) + 1) also presents a lower bound 
on the accuracy of every estimator for hK*{Qi)- This implies, in particular, optimality of hi as an 
estimator of hK*{0i)- 

One needs to be careful in formulating the lower bound result in this setting. A first attempt 
might perhaps be to prove that, for a universal positive constant c, 

{h-hK*{0i)) >c- ^ 

where the infimum is over all possible estimators h. This, of course, would not be possible because 
one can take h = hK*{Si) which would make the left hand side above zero. A formulation of 
the lower bound which avoids this difficulty was proposed by Cai and Low (2015) in the context of 
convex function estimation. Their idea, translated to our setting of estimating the support function 
hx* at a point is to consider, instead of the risk at K*, the maximum of the risk at K* and the 
risk at L* which is most difficult to distinguish from K* in term of estimating hK*{0i)- This leads 
to the benchmark Rn{K*,6i) defined in (4). 


Theorem 3.2. For any fixed i € {1,..., n}, we have 


Rn{K*,ei)>c- 


h{i) + 1 


for a universal positive constant c. 


(19) 


Theorems 3.1 and 3.2 together imply that a‘^/{k^{i) + 1) is the optimal rate of estimation of 
hK*fii) for a given compact, convex set K*. The results show that our data driven estimator hi for 
hK*fii) performs uniformly within a constant factor of the ideal benchmark Rn{K*,0i) for every 
i. This means that hi adapts to every unknown set K* instead of a collection of large parameter 
spaces as in the conventional minimax theory commonly used in nonparametric literature. 

Given a specific set K* and 1 < f < n, the quantity A:*(i) is often straightforward to compute 
up to constant multiplicative factors. Several examples are provided in Section 4. From these 
examples, it will be clear that the size of cr^/(A;*(i) + 1) is linked to the level of smoothness of the 
function hx* at 6i. However for this interpretation to be correct, one needs to regard hx* as a 
function on instead of a subset of M. This is explained in Remark 4.1. 

The following corollaries shed more light on the quantity cr^/(A;*(i) + l). The first corollary below 
shows that a‘^/{k^{i) + 1) is at most C{a‘^Rjn)~'^^^ for every i and K* (C is a universal constant). 
This implies, in particular, the consistency of hi as an estimator for hx*{0i) for every i and K*. In 
Example 4.3, we provide an explicit choice of i and K* for which /{k^{i) + 1) > c{a‘^Rln)~‘^^^ 
(c is a universal constant). This implies that the conclusion of the following corollary cannot in 
general be improved. 
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Corollary 3.3. Suppose K* is contained in some elosed ball of radius R. 
{1,..., n}, we have 


a 


K(i) + 1 


< C 



2/3 


Then for every i G 


( 20 ) 


and 

E(^hi-hK*{9,)) . 

for a universal positive constant C. 


( 21 ) 


It is clear from the definition (18) that k^{i) < n for all i and K^. In the next corollary, we 
prove that there exist sets -fC* and i for which A;*(z) > cn for a constant c. For these sets, the 
optimal rate of estimating hx* (9i) is therefore parametric. 

For a hxed i and K*, let fi{i) and 4>2{i) be such that 4>i{i) < Oi < 4>2ii) and such that there 
exists a single point {xi,X 2 ) G K* with 

hK*{9) = xicos6 + X2sm0 for all 0 G [(()i(i), (() 2 (i)]. (22) 

The following corollary says that if the distance of 9i to its nearest end-point in the interval 
[(j)i{i), 4 > 2 {i)] is large (i.e., of constant order), then the optimal rate of estimation of hK*{di) is 
parametric. This situation happens usually for polytopes (polytopes are compact, convex sets with 
finitely many vertices); see Examples 4.1 and 4.3 for specific instances of this phenomenon. For 
non-polytopes, it can often happen that (i>i{i) = </>2(0 = ™ which case the conclusion of the next 

corollary is not useful. 

Corollary 3.4. For every i G {1,... , n}, we have 

K{i) > c nmm{9i - (j)i{i),(j) 2 {i) - 0i,7r) (23) 


for a universal positive constant c. Consequently 

E {hi - hx* {9i)) < — -—V- , - a -1 

V / 1-I-nmm(0j - (/)l(^),(() 2 (^) - 6'i,7r) 

for a universal positive constant C. 


(24) 


From the above two corollaries, it is clear that the optimal rate of estimation of hx*{9i) can be 
as large as and as small as the parametric rate n~^. The rate is achieved, for example, 

in the situation demonstrated in Example 4.3 while the parametric rate is achieved, for example, 
for polytopes. 

The next corollary argues that in order to bound fc*(/) in specific examples, one only needs to 
bound the quantity Afc(0j) from above and below. This corollary will be very useful in Section 4 
while working out A:*(i) in specihc examples. 
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Corollary 3.5. Fix 1 < i < n. Let {fk{Gi),k G 1} and {gk{di),k G 1} be two sequences which 
satisfy gk{6i) < ^k{6i) < fkidi) for all k €l. Also let 

k{i) := max |fc G X : fk{0i) < | (25) 

and 

k{i) := min jfc G X : gkiOi) > | (26) 

as long as there is some k & I for which gk{0i) > 6(\/2 — \')aj^k + 1; otherwise take k{i) := 
maxfcgjA:. We then have k{i) < k*{i) < k{i) and 

lEi^* (hi - hK* {ei)Y < Cy-^— (27) 

^ ^ k{i) +1 

for a universal positive constant C. 


3.2 Accuracy of Set Estimators 

We now turn to study the accuracy of the set estimators K (dehned in (14)) and K' (defined in 
(16)). The accuracy of K will be investigated under the loss function Lf (defined in (3)) while the 
accuracy of K' will be studied under the loss function L (defined in (7)). 

In Theorem 3.6 below, we prove that 'E,K*Lf[K*,K) is bounded from above by a constant 
multiple of as long as K* is contained in a ball of radius R. The discussions following the 

theorem shed more light on its implications. 

Theorem 3.6. If K* is contained in some closed ball of radius R>0, we have 

(28) 

for a universal positive constant C. Note here that R = 0 is allowed (in which case K* is a 
singleton). 


Note that as long as i? > 0, the right hand side in (28) will be dominated by the {a'^VR/n) ^1'° 
term for all large n. This would mean that 


sup ]E^*Lj(A*,A) < C 
K*&K.{R) 



(29) 


where JC{R) denotes the set of all compact convex sets contained in some fixed closed ball of radius 
R. 
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The minimax rate of estimation over the class JC{R) was studied in Guntuboyina (2011). In 
Guntuboyina (2011, Theorems 3.1 and 3.2), it was proved that 


inf sup EK*Lf{K*,k) 
K K*£K.{R) 



(30) 


where x denotes equality upto constant multiplicative factors. Prom (29) and (30), it follows that 
A' is a minimax optimal estimator of K*. We should mention here that an inequality of the form 
(29) was proved for the least squares estimator Ki^ by Gardner et al. (2006) which implies that .^is 
is also a minimax optimal estimator of K*. 

The n~^l^ minimax rate here is quite natural in connection with estimation of smooth functions. 
Indeed, this is the minimax rate of estimation of twice smooth one-dimensional functions. Although 
we have not made any smoothness assumptions here, we are working under a convexity-based 
constraint and convexity is associated, in a broad sense, with twice smoothness (see, for example, 
Alexandrov (1939)). 


Remark 3.2. Because of the formula (3) for the loss function Lj-, the risk ¥,K*Lf{K*,K) can be 
seen as the average of the risk of K for estimating hK*{0i) over i = 1,... ,n. We have seen in 
Section 3.1 that the optimal rate of estimating can be as high as Theorem 3.6, on 

the other hand, can be interpreted as saying that, on average over i = 1,... ,n, the optimal rate 
of estimating hK*{(^i) is at most n~^l^. Indeed, the key to proving Theorem 3.6 is to establish the 
following inequality: 


-E 

n 


^ Kii) + 
2=1 ^ ^ 


< C 



under the assumption that K* is contained in a ball of radius R. Therefore, even though each term 
/(kif(i) + 1) can be as large as on average, their size is at most . 


Remark 3.3. Theorem 3.6 provides different qualitative conclusions when K* is a singleton. In 
this case, one can take i? = 0 in (28) to get the parametric bound Ca’^jn for KK*Lf{K*,K). 
Because this is smaller than the nonparametric rate, it means that K adapts to singletons. 

Singletons are simple examples of polytopes and one naturally wonders here if K also adapts to 
other polytopes as well. This is however not implied by inequality (28) which gives the rate n~^l^ 
for every K* that is not a singleton. It turns out that K indeed adapts to other polytopes and we 
prove this in the next theorem. In fact, we prove that K adapts to any K* that is well-approximated 
by a polytope with not too many vertices. It is currently not known if the least squares estimator 
Kif. has such adaptive estimation properties. 


In the next theorem, we prove another bound for ¥.K*Lf{K*,K). This bound demonstrates 
adaptive estimation properties of K as described in the previous remark. Before stating the the¬ 
orem, we need some notation. Recall that polytopes are compact, convex sets with finitely many 
extreme points (or vertices). The space of all polytopes in M"" will be denoted by V. For a polytope 
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P € V, we denote by vp, the number of extreme points of P. Also recall the notion of Hausdorff 
distance between two compact, convex sets K and L defined by 


£h{K,L) ;= sup \hK{0) - /il( 6»)| . 
eeR 


(31) 


This is not the usual way of defining the Hausdorff distance. For an explanation of the connection 
between this and the usual definition, see, for example, Schneider (1993, Theorem 1.8.11). 


Theorem 3.7. 


There exists a universal positive constant C such that 


EK*Lf(K*K) < C inf 
p&V 


^ a'^vp 


n 


iogf-V4(i^*,^) 


\vp ) 


(32) 


Remark 3.4 (Near-parametric rates for polytopes). The bound (32) implies that h has the para¬ 
metric rate (upto a logarithmic factor of n) for estimating polytopes. Indeed, suppose that K* is 
a polytope with v vertices. Then using P = K* in the infimum in (32), we have the risk bound 

Ei^*L/(iF%iF) < ^^log(^) . (33) 

This is the parametric rate a^vjn up to logarithmic factors and is smaller than the nonparametric 
rate given in (28). 


Remark 3.5. When v = 1, inequality (33) has a redundant logarithmic factor. Indeed, when 
u = 1, we can use (28) with R = 0 which gives (33) without the additional logarithmic factor. We 
do not know if the logarithmic factor in (33) can be removed for values of v larger than one as well. 


We now turn to our second set estimator K'. For this estimator, the next theorem provides an 
upper bound on its accuracy under the integral loss function L (defined in (7)). Qualitatively, the 
bounds on '&k*L{K*,K') given in the next theorem are similar to the bounds on 'KK*Lf{K*,K) 
proved in Theorems 3.6 and 3.7. 


Theorem 3.8. Suppose K* is contained in some closed hall of radius R> 0. The riskE,K*L(K*, K') 
satisfies both the following inequalities: 


¥.K*L{K*,k') < C < 


a 

-h 

n 



(34) 


and 


Ek*L(K*,K') < C inf 
Per 


a'^vp 

n 


log 



+ ijj{K*,P) + 



(35) 


The only difference between the inequalities (34) and (35) on one hand and (28) and (32) on 
the other is the presence of the R?/n^ term. This term is usually very small and does not change 
the qualitative behavior of the bounds. However note that inequality (32) did not require any 
assumption on K* being in a ball of radius R while this assumption is necessary for (35). 
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Remark 3.6. The rate is the minimax rate for this problem under the loss function 

L. Although this has not been proved explicitly anywhere, it can be shown by modifying the proof 
of Guntuboyina (2011, Theorem 3.2) appropriately. Theorem 3.8 therefore shows that K' is a 
minimax optimal estimator of K* under the loss function L. 


4 Examples 


We now investigate the conclusions of the theorems of the previous section for specific choices of 
K*. For calculations in the following examples, it will be useful here to note that the quantity 
^k{Si) = Uk{0i) — Lk{6i) has the following alternative expression: 

1 / hx* {Oj + 4j7r/n) + hx* {Oj - 4j7r/n) _ cos(4j7r/n) hx»{0i + ^j-irln) + hx»{0i - 2j7r/n) " 

/c +1\ 2 cos(2j7r/n) 2 ^ 

(36) 

Example 4.1 (Single point). Suppose K* := {(a;i,X 2 )} for a hxed point {xi,X 2 ) G In this case 

hx*{0) = xicosO + X2sm9 for all 0. (37) 


It can then be directly checked from (36) that Afc(0j) = 0 for every A: € X and i € {1,..., n}. As a 
result, it follows that k^{i) = max^^ik > cn for a positive constant c. 

Theorem 3.1 then says that the point estimator hi satishes 


Ex* 



Ca^ 

n 


(38) 


for a universal positive constant C. One therefore gets the parametric rate here. 


Also, Theorem 3.6 and inequality (34) in Theorem 3.8 can both be used here with R = 0. This 
implies that the set estimators K and K' both converge to K* at the parametric rate under the 
loss functions Lf and L respectively. 


Example 4.2 (Ball). Suppose K* is a ball centered at (xi,X 2 ) with radius R > 0. It is then easy 
to verify that 

hx*{0) = xicosO + X2sm0 + R for all 0. (39) 

As a result, for every /c S X and i G {1,..., n}, we have 


Afc(0j) 


^ 1 ^ \ COS ^ y V cos 2T^k/n) 


R(1 + 2 cos 27rA;/n) 
cos 2xk/n 


(1 — cos 2'Kkln) . 


(40) 


Because k < n/16 for all A: G X, it is easy to verify that Ak{9i) < 8Rsm‘^{7rk/n) < SRir'^k'^/n^. 
Taking /fc(0j) = SRn'^k'^/n? in Corollary 3.5, we obtain that A:*(i) > c{na^jR^^A for a constant c. 
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Also since the function 1 — cos(2x)/ cos(3:) is a strongly convex function on [—7r/4,7r/4] with second 
derivative lower bounded by 3, we have 


Afc(0i) 


^ Y' / cos \ ^ R 3 / 27 rj \ 


R'K‘^k{2k + 1 ) 


This gives < C{na ^as well for a constant C. We thus have /c*(i) x {na^j for 

every i. Theorem 3.1 then gives 



for every i £ {1,..., n}. 


(41) 


Theorem 3.6 and inequality (34) prove that the set estimators K and K' also converge to K* at 
the rate. 


In the preceding examples, we saw that the optimal rate jik^ii) + 1) for estimating hK*{0i) 
did not depend on i. Next, we consider asymmetric examples where the rate changes with i. 


Example 4.3 (Segment). Suppose K* is the vertical line segment joining the two points (0,i?) 
and (0, —R) for a fixed i? > 0. One then gets hK*{0) = R\ sin0| for all 6. For simplicity, assume 
that n is even and consider i = n/2 so that = 0- It can then be verified that 


Aki9n/2) = Afc(O) = ^ tan ^ 

j=0 


for every A: £ X. 


Because j i—)• tan(27rj/n) is increasing, we get 


SnRk R ,3k , ^ „ / , , ^ „ , , , x AirRk 

—^ ^ ^ ^ (— + 1) tan(27rA:/4n) < Afc(O) < i?tan(27rA:/n) < 2i?sin(27rA:/n) < -. 

Corollary 3.5 then gives 

cr^ / a^R 

k^{n/2) + 1 \ n 

It was shown in Corollary 3.3 that the right hand side above represents the maximum possible 
value of cr^/(A:*(i) + 1) when K* lies in a closed ball of radius R. Therefore this example presents 
the situation where estimation of hK*{0i) is the most difficult. See Remark 4.1 for the connection 
to smoothness of /ix*(-) at 0*. 



Now suppose that i = 3n/4 (assume that n/4 is an integer for simplicity) so that Oi = 7r/2. 
Observe then that hK*{9) = RsinO (without the modulus) for 9 = 9i A ^jn/n for every 0 < j < 
k,k Gl. Using (36), we have Ak{9i) = 0 for every k Gl. This immediately gives /c*(f) = [n/16j 
and hence 


a 


2 


a 


fc*(3re/4) + 1 n 


(43) 
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In this example, the risk for estimating hK*{0i) changes with i. For i = n/2, we get the rate 

while for i = 2>nlA, we get the parametric rate. For other values of i, one gets a range of rates 
between and n~^. 

Because K* is a polytope with 2 vertices, Theorem 3.7 and inequality (35) imply that the set 
estimators K and K' converge at the near parametric rate logn/n. It is interesting to note here 
that even though for some the optimal rate of estimation of hK*{Pi) is the entire set can 

be estimated at the near parametric rate. 


Example 4.4 (Half-ball). Suppose K* := {(xi,X 2 ) : + < 1,X2 < 0}. One then has hK{G) = 1 

for —TT < 0 < 0 and hxiO) = | cos0| for 0 < 6 < tt. Assume n is even and take i = nj^ so that 
Qi = 0. Then 

cos dvr j /n \ 
cos 27r j /n y 


Afc(O) = 


k + 1 


E 

3=0 


COS dvrj /n + 1 cos dvrj /n cos 2TTj/n + 1 

2 cos 27rj /n 2 


E 


2(fc + 1) 


1 - 


This is exactly as in (40) with R 
we obtain that 


1 and an additional factor of 1/2. Arguing as in Example 4.2, 


a 


/c*(n/2) + 1 V ra y 




Now take i = 3n/4 (assume n/4 is an integer) so that Oi = 7r/2. Observe then that hx* [0] = | cos 9\ 
for 9 = Oiziz Ajirln for every 0 < j < k,k & I. The situation is therefore similar to (42) and we 
obtain 


a 


k^,{2>n/A) + 1 


n y 


Similar to the previous example, the risk for estimating hx*i9i) changes with i and varies from 
to On the other hand. Theorem 3.6 states that the set estimator K still estimates K* 

at the rate n~^l^. 


Remark 4.1 (Connection between risk and smoothness). The reader may observe that the support 
functions (37) and (39) in the two examples above differ only by the constant R. It might then 
seem strange that only the addition of a non-zero constant changes the risk of estimating hx*{9i) 
from n~^ to . It turns out that the function (37) is much more smoother than the function 
(39); the right way to view smoothness of hK*{-) is to regard it as a function on This is done 
in the following way. Define, for each z = (zi, Z 2 ) £ 

hK*{z) = max {xizi + X 2 Z 2 ) ■ 

{xi,X2)€K* 

When 2 : = (cos0,sin0) for some 0 G M, this dehnition coincides with our dehnition of hK*[9). A 
standard result (see for example Corollary 1.7.3 and Theorem 1.7.4 in Schneider (1993)) states that 
the subdifferential of 2 ; e-)- hx-iz) exists at every 2 = ( 21 , 22 ) £ and is given by 

F{K*,z) := {{xi,X 2 ) £ K* : hx*{z) = xizi + X 2 Z 2 } ■ 
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In particular, z i-)- hK*{z) is differentiable at z if and only if F{K*, z) is a singleton. 

This point of view of studying hx* as a function on sheds qualitative light on the risk 
bounds obtained in the examples. In the case of Example 4.1 when K* = {{xi,X 2 )}, it is clear that 
F{K*,z) = {{xi,X 2 )} for all z. Because this set does not change with z, this provides the case of 
maximum smoothness (because the derivative is constant) and thus we get the n~^ rate. 

In Example 4.2 when K* is a ball centered at x = {xi,X 2 ) with radius R, it can be checked 
that F{K*,z) = {x + i?z/||z||} for every z / 0. Since F{K*,z) is a singleton for each 2 : 7 ^ 0, it 
follows that 2 : !-)• hK*{z) is differentiable for every 2 ;. For i? / 0, the set F{K*,z) changes with 2 : 
and thus here hx* is not as smooth as in Example 4.1. This explains the slower rate in Example 
4.2 compared to 4.1. 

Finally in Example 4.3, when K* is the vertical segment joining (0, i?) and (0, —R), it is easy 
to see that F{K*,z) = K* when 2 : = (1,0). Here F{K*,z) is not a singleton which implies that 
hx*{z) is non-differentiable at 2 ; = (1,0). This is why one gets the slow rate for estimating 

hx*{&n/ 2 ) ill Example 4.3. 


5 Discussions 

In this paper we study the problems of estimating both the support function at a point, hx*{di), 
and the convex set K*. Data-driven adaptive estimators are constructed and their optimality 
is established. For pointwise estimation, the quantity k^{i), which appears in both the upper 
bound (17) and the lower bound (19), is related to the smoothness of hx*{d) at 6 = Ot. The 
construction of hi is based on local smoothing together with an optimization scheme for choosing 
the bandwidth. Smoothing methods for estimating the support function have previously been 
studied by Fisher et al. (1997). Specifically, working under certain smoothness assumptions on the 
true support function hx*{0), Fisher et al. (1997) estimated it using periodic versions of standard 
nonparametric regression techniques such as local regression, kernel smoothing and splines. They 
evade the problem of bandwidth selection however by assuming that the true support function is 
sufficiently smooth. Our estimator comes with a scheme for choosing the bandwidth automatically 
from the data and hence we do not need any smoothness assumptions on the true convex set. 

To avoid complications, we have assumed throughout the paper that the noise level a is known. 
In practice, a is typically unknown and needs to be estimated. Under the setting of the present 
paper, a is easily estimable by using the median of the consecutive differences. Let 6i = ¥21 — 
y 2 i-i, * = 1 , ■ ■ ■, L§J- ^ simple robust estimator of the noise level a is the following median 
absolute deviation (MAD) estimator: 

median|(5j — median((5j)l 
^ “ 1.349 ■ 

It was noted that the construction of our estimators K and K' given in Section 2.2 does not 
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involve any special treatment for polytopes; yet we obtain faster rates for polytopes. Such automatic 
adaptation to polytopes has been observed in other contexts: isotonic regression where one gets 
automatic adaptation for piecewise constant monotone functions (see Chatterjee et al. (2014)) and 
convex regression where one gets automatic adaptation for piecewise affine convex functions (see 
Guntuboyina and Sen (2013)). 

Finally, we note that because + 1) gives the optimal rate in pointwise estimation, it 

can potentially be used as a benchmark to evaluate other estimators for hK*{0i) such as the least 
squares estimator {6i). This however is beyond the scope of the current paper. 


6 Proofs of the main results 

We prove the main results in this section. Additional technical results and proofs are given in 
Appendix A. 

6.1 Proof of Theorem 3.1 

We provide the proof of Theorem 3.1 here. The proof uses three simple lemmas: Lemma A.l, A.2 
and A.3 which are stated and proved in Appendix A. 

Fix i = 1,... ,n. Because hi = we write 

{hi - hK*{0^))^ = ^{uk(Oi) - hK*{e ^))' / = k] 

k£X 

where /(•) denotes the indicator function. Taking expectations on both sides and using Cauchy- 
Schwartz inequality, we obtain 

Ek* {hi - hK*{ei)y < Y, ynMOi) - hK*{ei)YJf‘K* {k{i) = k}. 

kex * 

The random variable 11^ — hK*{0) is normally distributed and we know that < 3(EZ^)^ for 
every gaussian random variable Z. We therefore have 

Ei^* {k - hK*my < - hK*{0i)?\I^K*{kii) = ky 

k&X ’ 

Because W,K*Uk{0i) = Uk{0i) (defined in (9)), we have 

^KkUk{0^) - hxkOi)? = {Uk{0i) - hK*m? +^^kUk{0i))- 

Because Lk{0i) < hK*{0i) < Uk{6i), it is clear that Uk{9i) - hK*{9i) < Uk{0) - Lk{9i) = Ak{0i). 
Also, Lemma A.3 states that the variance of IJk is at most jik + 1). Putting these together, we 
obtain 

E^* (A* - hKk0^)y < V3^ {ai{0 ^)+ 

hczT k “*” / V 


18 







The proof of (17) will therefore be complete if we show that 


g + rri) s 


( 44 ) 


for a universal positive constant C. 

Below, we write and for Ak{0i), k{i) and k^{i) respectively for ease of notation. We 

also write P for Pi^*. 

We prove (44) by considering the two cases: k < k^,k & I and k > k^,k ^ I separately. 

The first case is k < k*,k £ I. By Lemma A.l and (88), we get 

6{V2-l)a ^ 6{V2-l)a 


Afc < Ak, < 


+1 \/Ar+~r 


and consequently 

■ + 1 “ k + 

We bound P{A: = k} by writing 


2 2 

I + ^ ^ ^36(-\/2 — 1)^ + for all /c < /c*, fe G X. 


(45) 


P{A: = A:}<P<! (Afc) 


< P M Afc J > 


2 cj 


2 a 


V^TT “ V "V VfcTTi 

Because A: < fc*, the positive part above can be dropped and we obtain 

P{fc = A:}<p|Afc >— 2 ^-— 

Because A^^ is normally distributed with mean A^^, we have 

2a{k + 1)“^/^ - 2a{k^ + 1)“^/^ - A^^ 


V^/c + 1 y/k^, +T J 


P{A: = A:} < P <( Z > 

yj var( Afc.) 

where Z is a standard normal random variable. From (88), we have 
2 cj 


\/k -\- \ + 1 


2'' l-./4±i 

y/k + 1 I V A:* + 1 


( 3 V 2 


- 2 


As a result. 


P{fc = A:} < P < Z > 


2 cj 


1 - 


A: + 1 


Suppose 


{k + l)var(AfcJ 

k-= (A:* + l)(3V2-2)~^-l. 


fc* + 


L(3^2 


-2 
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For k < k, we use the bound given by Lemma A.3 on the variance of A^* to obtain 


F{k = k} <¥{Z >2{ - 3^2 + 2 j I < exp ( -2 


Using this and (45), we see that the quantity 


+ 1 
k + 1 


- 3 V 2 + 2 


k<k,kGX 


E \/P{i = t) 


is bounded from above by 


cr 




fc* + 


k<k,kel 


fe* + 1 
k + 1 


1 2' 


- 3 V 2 + 2 


Because I consists of integers of the form 2^, it follows that for any two successive integers ki and 
/c 2 in I, we have 3/2 < (ki + l)/{k 2 + 1) < 2. Using this, it is easily seen that 


E /c* + 1 


k<k,kGX 

is bounded from above by 

2^ exp 

i>4 


k + l 


/c* + 1 
k + 1 


- 3 V 2 + 2 


1 2 


(3/2)^/2 - 3V2 + 2 ) + 2A 

0<i<3 


which is just a universal positive constant. We have proved therefore that 

^2 ^ 


E [ + + 

k<k,kGX 


a 


k + 1 


¥{k = k} < 


k^ + 1 


(46) 


for a positive constant Ci. 

For k < k < k^, we simply use (45) along with the trivial bound P{fc = /c} < 1 to get 


E + 

k<k<k^^k^X 


(7 


k+1 


Y^P{fe = k] < (36{V2 - 1) 


-ir+ 1 


a 


1 

k:^ \ k \ 

k<k<k*,k£X 


T E 


Once again because I consists of integers of the form 2 ^, we get 

k^ + 1 ^ . r / ^ N 2 


E 


f - ^ 1 (3/2)J < ( 3 V 2 - 2 


k + _ 

k<k<k,,kex i>0 


The right hand side above is just a constant. It follows therefore that 

.2 ^ 


E { + + 


k<k<k* ,k^X 


(J 


k + 1 


P{fc = k} < 


kif + 1 


(47) 


20 


































for a positive constant C 2 . Combining (46) and (47), we deduce that 


E {^1 + 

k<ki^^k£X 


(7 


k + l 


= k}< 


Ca‘^ 
/c* + 1 


(48) 


where C '.= Ci + C 2 is a universal positive constant. 

We next deal with the case k > A:*, A: EX. Assume that {A;EX:A:>A;*}is non-empty for 
otherwise there is nothing to prove. By the hrst part of (89), we get 

^2 


E + = + Aiv'p{i = n. (49) 


k>k*,k£X ^ ^ E V v / / k>k^,kGX 

We hrst bound P{A: = A:} for A; > A:*, A: E X. We proceed by writing 


P{fc = A:} < P IA+ + -^= < A+ + 


VA; + 1 


\/A^* 1 


t™ I A 2cr - , 2a 

< P ■( Afc -|—< '^fc* 


\/k + 1 


\/fe* p~r 

t™ I A 2(7 - 2(7 

E P Afc -|— ^ < Afc^ + 


■v/AT-FT 


E P ^ Afc < Afc, -|- 


\/A;* + 1 


(because x < x~^) 


TTT. I A 2(7 2(7 

+ Px Afc H- . < 


■v/AT-pT y/k^ \ 


2(7 


< P <! Afc. - Afc > - 


\/AF~-PT 

2(7 


+ Px s Afc < 


2a 


y/k^, + 1 


+ P s ~Afc > — 


\/A;* + 1 
2(7 




Both Afc^ — Afc and A^ are normally distributed with means A^^ — A^ and A^ respectively. As a 
result 


P{fc = A;} < P < 


Z > 


Afc - Afc, - 2a{k^ + 1) 


var(Afc, - Afc) 


-FP < Z > 


Afc - 2(7(A:* + 1) 


var(Afc) 


where Z is a standard normal random variable. Using (88), we obtain 


P{fc = A:} < P < Z > 


Afe - 2 (t(A:, + 1)-E2 (3^2-2) 


-hP < Z > 


Afc - 2 ct(A:* + 1) 


'var(Afc^ - Afc) 

By the Cauchy-Schwarz inequality and Lemma A.3, we get, for A; > A:*, 


var(Afc) 


. 


var(Afc^ - Afc) < Uvar(AfcJ + yj var(Afc) < + . < 


y/k yj fc* pT 'v/fc* pT 

Also var(Afc) < a"^jik + 1) E a'^l{k^, + 1). Therefore if k > A:*, A: E X is such that 

Afc > 2cj(A:* + l)-^/2 _ 2 ^ ^ 


(50) 
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we obtain 


P{fc = A:} < 


Z > 


<2¥< Z> 


Ak - 2a{K + ( 3^2 - 2 ) ) 

aV2{K + 1)-V2 J 

Afc - 2 ct(A:* + l)-i/2 (3^2-2) 


+ F<Z> 


Afc - 2cj(A;* + 1) ^/2 


(J 


ih + 1)-V2 


cr 


+ l)-l/2 


< 2 exp — 


/c* + 1 


(Afc - 2a(A:* + l)-^/\3^/2 - 2)) 


Using the inequality (x — y)^ > x^/2 — with x = A^ and y = 2cr(A:* + 1) ^/^(3\/2 — 2), we obtain 


P{^ = k} <2 exp (2{3V2 - 2)' 


exp 


jk* + 

4(j2 


(51) 


whenever k £ I, k > k^, satisfies (50). It is easy to see that when (50) is not satisfied, the right 
hand side above is larger than 2. Thus, inequality (51) is true for all A: G X, A; > A:* . As a result, 

Al^F{k = k} < V2exp(^{3V2-2f^^{Al) for all k £ I,k > k^. (52) 

where 

C{z) := zexp for z > 0. 

By (49) and (52), the proof would therefore be complete if we show that Ylk£i-k>k- is 

bounded from above by a universal positive constant. For this, note first that the function ^(z) is 
decreasing for z > z := 8(j^/(A;* + 1) and attains its maximum over 2 : > 0 at z = z. Note also the 
second part of inequality (89) gives A| > z^ for all A: G X, A; > k^, where 


We therefore get 


(\/6 - 2)^cr^(A; + 1) 
4(A:* + 1)2 


^ (Afc) < ^(max( 2 :fc, z)) = may:{zk, z) exp 

< max( 2 :fc, z) exp 


— (A:* + 1) max( 2 ;fc, z) 

8^2 

— (A:* + 1 ) 2 :^; 


8ct2 


< {Zk + z) exp 


-(fc* + l)Zk 

8cj2 


Because A: > A;*, it is easy to see that 

80-2 ^ 8 a 2 (A: + l) 
A:* + 1 “ (/c* + 1)2 

We deduce that 


e (Al) < 


{VG-2f 

4 


+ 8 


a2(A: + l) 
{k, + 1)2 



(V6-2)2 fc + 1 \ 

32 A:* + 1 y 


Denoting the constants above by ci and C 2 , we can write 


E 

kGX:k>kf 


Cl a 


2 


A;* + 1 


E 

k£I\k>kt 


k + l 
fc* p 1 



k + l \ 

c2{k +1) y ’ 
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The sum in the right hand side above is easily seen to be bounded from above by 


which is farther bounded by a universal constant. This completes the proof of Theorem 3.1. 


6.2 Proof of Theorem 3.2 

This subsection is dedicated to the proof of Theorem 3.2. We use Lemma A.4 which is stated and 
proved in Section A. We also use a classical inequality due to Le Cam (1986) which states that for 
every estimator h and compact, convex set L*, 


max 


(h - hK*{ei)y , El* (h - hL*{9,)y 


> ^ {hK*{0i) - hL*{6i))^ (1 - \\Pk* - Pl*\\tv) ■ 

(53) 

Here Pl* is the product of the Gaussian probability measures with mean hL*{di) and variance 
for i = 1 ,..., n. Also ||P — Q||tv denotes the total variation distance between P and Q. 

For ease of notation, we assume, without loss of generality, that 9i = 0. We also write for 
Ak{9i) and for k*{i). 

Snppose first that K* satisfies the following condition: There exists some ol € (0, vr/d) such that 

hK*{a) + hK*{-a) 


- hK*iO) > 


a 


/Tin 


(54) 


2 cos a 

where tIq denotes the number of integers i for which —a < ‘Im jn < a. This condition will not be 
satisfied, for example, when K* is a singleton. We shall handle such K* later. Observe that no, > 1 
for all 0 < a < 7r/4 because we can take z = 0. 

Let us define, for each a £ (0,7r/4), 

' hK*{oi) + hK*{—oi) hK*{oi) — hx*{—Oi) 


«lc(a) := 


2 cos a 


2 sin a 


and let L* = L*{a) be defined as the smallest convex set that contains both K* and the point 
aK*{oi). In other words, L* is the convex hull of K* U {ax*(a)}- 

We now use Le Cam’s inequality (53). To control the total variation distance in the right hand 
side of (53), we use Pinsker’s inequality: 


\\Pk* - Pl*\\tv < \l -^DiPK*\\PL*), 


and the fact that (note that 9i = 27ri/n — tt) 

1 

D{Pk*\\Pl*) = ^ {hK*{2i7rln - vr) - hL*{2i-K/n - vr))^ . 


i=l 
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The support function of L* is easily seen to be the maximum of the support functions of K* and 
the singleton {ox*(a)}- Therefore, 

, (i. hK*{a) + hK*{-a) hK*{a) - hK*{-a) 

hL*{0) := max hK*{9), -^^ cosS H- „ . -^^ sm( 


2 cos a 


2 sin a 


= max I hK*{9), -^- hK*{o) H-^- hK*{—a) 


sin 2a 


sin 2a 


Using (1), it can be shown that 


, ^ sin(0 + a), , , sin(a - 9), , . 

hK*{9) < -^—-— -hK*{oi) H-^—-— -hK*i—a) for —a < 9 < a, 


sin 2a 


sin 2a 


(55) 


and 


, sin(0 + a) , , , sin(a — 0), , , , „ r nr, 

hK* (9) > ^—-— -hx* (a) H ^—r— -hx* (—o) for 9 G [—vr, —a] U [a, vr]. 


. „ ... „ ... . -. .., (56) 

sm 2a sm 2a 

To see this, assume that 9 > 0 without loss of generality. We then work with the two separate cases 
9 G [0,a] and 9 G [a,7r]. In the first case, apply (1) with ai = a,a = 0 and a 2 = —a to get (55). 
In the second case, apply (1) with ai = 0, a = a and a 2 = —a to get (56). 

As a result of (55) and (56), we get that 

sin(0 + a) sin(a - 9), , . . „ 

hL*{9) =- -hx*{oi) H- -hx*i—a) for —a < 9 < a, 

sin 2a sin 2a 

and that hL*{9) equals hx*{9) for every other 9 in (—vr,vr]. 

We now give an upper bound on hL*{9) — hx*{9) for 0 < 0 < a. Using (1) with ai = 0, a = 0 
and a 2 = —a, we obtain 

, sin(a + 0), , , sin0, 

hx*{9) > - hx*{0) - hx*(—<y)- 

sm a sm a 

Thus for 0 < 0 < a, we obtain the inequality 

„ . sin(0 + a), , , sin(a — 0), . . , 

0 < /ii*(0) — hx*{9) = ^— - hx*{a) H-—- hx*{—a) — hx*{9) 


< 


sin 2a sin 2a 

sin(0 + a) / hx* (a) + hx* (—a) 


sma 


— /ix*(0) 


2 cos a 

Because 0 < a < vr/d, 0 < 9 < a, we use the fact that the sine function is increasing on (0,7r/2) to 
deduce that 

0 < hi* {9) — hx*{9) < ^ ^ ^ ^^ — hx*{0) for all 0 < 0 < a. 

2 cos a 

One can similarly deduce the same inequality for the case —a < 0 < 0 as well. 

Because of this and the fact that hL*{9) equals hx*{9) for all 0 in (—7r,7r] that are not in the 
interval (—a, a), we obtain 

1 "" 

D{Px*\\Pl*) = ^ {hx*{2i7r/n - vr) - hL*{2i'K/n - vr))^ 


i=l 


< 


rig 

2cj2 


hx*{a) + hx*{-a) 
2 cos a 


— hx*{^) 
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Also because /iL*(0) = {hK*ia) + a))/(2 cos a), we obtain, by (53), that 


r > 


1 f hK*{oi) + hK*{-oi) 
2 cos a 


— hK*(0) ) ( 1 — 


for every 0 < a < 7r/4 where 


r := inf max 
h 


ng f hK*{a) + hK*{-a) 

4cr2 V 2 cos a 


"&K* [h — hK*{0i)] I'&L* {h — hL*{9i 


— hK*{0) 


(57) 


(58) 


where the infimum above is over all estimators h. Let us now define a* by 

n f ,, hK*(a) + hK*{—a) , , , cr 

a* := mf < 0 < a < 7r/4 ; --- hK*iO) > 


2 cos a ' ' y/Ug 

Note first that a* > 0 because Ug > 1 for all a and thus for a very small while the quantity 
{hK*{oi) + hK*{—cn)) / {2 cos a) — hK*{^) becomes close to 0 for small a (by continuity of hK*{-))- 

Also because we have assumed (54), it follows that 0 < a* < 7r/4. Now for each e > 0 sufficiently 
small, we have 


hK* (a* - e) + hK* (-a* + e) 


- hK*{0) < 


a 


2cos(a*-e) “ ' ' 

Letting e | 0 in the above and using the fact that Ug^-^ —>■ and the continuity of hx*, we 

deduce 

h (n ] -\- h (— n/ I rr 

(59) 


hK*{a^) + hK*i-a^) a 

- nK*\o) < 


2 cos (y.i^ 

Because 0 < a* < 7r/4, by the definition of the infimum, there exists a decreasing sequence 
{ak} G (0,7r/4) converging to a* such that 

hK*{otk) + hK*{—oik) 


2 cos Ofc 


— /i/s'*(0) > 


a 


for all k. 


/n. 


Oik 


For k large, Ug, is either or Ug^ + 2, and hence letting k oo, we get 




> 


1 a 


2 cos a* “ + 2 

where we also used that Ug^ > 1. Combining the above with (59), we conclude that 

1 a ^ hx-ia*) + hK*i-a^) , , cr 

<- nx*[^) < -• 


Using a = a* in (57), we get 

We shall now show that 


2 cos a* 


/Ur 


r > 


a 


a* < d := 


24?t.q,, 

8(A:* + l)7r 


n 


(60) 


(61) 


when 8(fc* + l)7r/n < 7r/4 (otherwise (61) is obvious). This would imply, because a is 

non-decreasing, that 

na 

ng, <ng = -1 = 8k* + 7. 

TT 
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This and (60) would give 


r > 


a 


> 


ca 


24{8h + 7) - h + 1 

for a positive constant c. This would prove the theorem when assumption (54) is true. 
To prove (61), we only need to show that 


hK*{a) + hK*{-a) 

2 cos a 


— hK*iO) > 


a 


a 


/rid 


\/8/c* + 7 


( 62 ) 


We verify this via Lemma A.4 on a case-by-case basis. When = 0, we have a = Svr/n so that, 
by Lemma A.4, the left hand side above is bounded from below by A2. Because is zero, by 
definition of k^,, we have 

. 2(j 

A 2 H—^ ^ Aq -|- 2cj — 2 ( 7 . 

V3 

This gives A2 > 2it( 1 — (l/\/3)) which can be verified to be larger than a/y/8k^ + 7 = it/\/7. 

When k^ = 1, we have a = IGnju so that, by Lemma A.4, the left hand side in (62) is bounded 
from below by A4. Because A;* = 1, by definition of k^, we have 

A 2(7 ^ 2(7 2a 

A4 H—p > Ai -|—— > —p 

'Jh \/2 \f2 

which gives A4 > 2(7((1/\/2) —(l/\/5)). This can be verified to be larger than (j/\/8fcr+T = al\/^. 

When kif > 2, we again use Lemma A.4 to argue that the left hand side in (62) is bounded from 
below by A2(fc^+i). Because A^ is increasing in k (Lemma A.l), we have A2(fc^+i) > A2fc*. By the 
definition of /c* (and the fact that A^^ > 0), we have 


> 


2o- 


A;* -|- 1 


1 - 


A;* -|- 1 
2A;* -|- 1 


Because A;* > 2, it can be easily checked that (A:*-|-1)/(2A;*-|-1) < 3/5 and (8A:*-|-7)/(A;*-|-l) > 23/3. 
These, together with the fact that 2(1 — ■\/3/5)'\/23/3 > 1, imply (62). This completes the proof 
of the theorem when assumption (54) holds. 


We now deal with the simpler case when (54) is violated. When (54) is violated, we first show 


that 


fc* > 


12n 


- 1 . 


16(1 -F 2^3)2 

To see this, note first that, because (54) is violated, we have 

hK*ia) + hK*{-a) 


(63) 


— hK*{0) < 


a 


/rin 


/na 

<a{ -1 

V TT 


- 1/2 


2 cos a 

for all a G (0,7r/4]. Lemma A.4 implies that for every 1 < A; < n/16, we get 

hK*{‘^k'K/n) + hK*{—4:k7r/n 


Afc < 


2 cos Akir/n 


— hK*{0) < 


a 


< 


a 


V4A;- 1 
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Now for every 


12n 


(64) 


k < 


- 1 , 


2cj 


2a 


16(1 + 2^3)2 

we have 

^ ^ 2a ^ 2a ^ a ^ _ ^ ^ ^ 

Vfc + 1 -v/AT+T Y^3n/16 Y^riTlS ^njX^ + 1 

It follows therefore that any k satisfying (64) cannot be a minimizer of + 2a{k + 1)“^/^, thereby 
implying (63). 


Let L* be defined as the Minkowski sum of K* and the closed ball with center 0 and radius 
a{3n/2)~^/‘^. In other words, L* := {x + a{3n/2)~^/'^y : x ^ K and ||y|| < l}. The support func¬ 
tion L* can be checked to equal: 


hL*{9) = hK*{d) + a{3n/2) 


Le Cam’s bound again gives 

r > ^ (hK*(0) - hL*(0))^ {1 - \ \Pk* - Pl*\\tv} 
where r is as defined in (58). By use of Pinsker’s inequality, we have 

\\Pk.-Pl* 

Therefore, from (65) and (63), we get that 

^ ^ ^ 1 _ 

^ - 12n~ 16(1 + 2^3)2 A;* + 1 ■ 

This completes the proof of Theorem 3.2. 


\tv <- 


1 

2 ct ’\^ 


(/ix(2f7r/n — tt) — hf.{2m/n — vr))^ = — 

1 

1=1 



(65) 


6.3 Proof of Theorem 3.6 

Recall the definition of in (13) and the definition of the estimator K in (14). The first thing to 
note is that 

h^{6i) = hf for every z = 1,... , re. (66) 

To see this, observe first that, because = {hf ,..., h!^) is a valid support vector, there exists a 
set K with hp.{0i) = hf for every i. It is now trivial (from the definition of K) to see that K Q K 
which implies that > ^x(^i) = the other hand, the definition of K immediately 

gives hf.{ei) < hf. 

The observation (66) immediately gives 

1 ” 2 
KK*Lf{K\k) = KK*-Y,{hK^{0i) - hf) 
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It will be convenient here to introduce the following notation. Let denote the vector {hx* (^i), • • •, hx* {dn))- 
Also, for u,v £ M”, let i{u, v) denote the scaled Euclidean distance dehned by v) := — 

Vi)"^ jn. With this notation, we have 

Ex*Lf{K*,K) = h^). (67) 

Recall that is the projection oi h := {hi,... , hn) onto T-L. Because is a closed convex subset 
of M”, it follows that (see, for example. Stark and Yang (1998)) 


i^{h, h) > i^{h, h^) + f‘{h, h^) for every h gH. 

In particular, with h = we obtain P{h'"j^t,h^) < ^^(h^f,h). Combining this with (67), we 
obtain 

1 ” 2 

Ex*Lf{K*,K) <Ex*f{h^i^^,h) = -Y,^x^ {hi - hx*{9i)) • (68) 


In Theorem 3.1, we proved that 


Ex* [hi 


hx* {Oi 


)) 


2 


< 


/c*(i) + 1 


for every i = 1,..., n. 


This implies that 


Ex*Lf{K\k) < 


1 

n ^k^{i) + l' 
1 = 1 ^ ' 


For inequality (28), it is therefore enough to prove that 


E 


1 

h{i) + 1 


< c 




(69) 


Our following proof of (69) is inspired by an argument due to Zhang (2002, Theorem 2.1) in a very 
different context. 


Recall that A:*(i) takes values in I := {0} U {2^ ■ j >0, 2^ < [n/16j}. For k £ I, let 


Pik) :='^I{k^{i) = k} 

i=l 


n 

and i{k) := ^^/{A:*(i) < k} 

i=l 


Note that .^(0) = 0,.^(1) = /9(0) and p{k) = l{2k) — l{k) iox k > l,k £ X. As a result 


E 


1 

k-f (i) + 1 


^ p{k) 

^ k + l 

kex 


m + 


E 

k>l,kGX 


i{2k) - e{k) 
k + 1 


Let K denote the maximum element of X. Because i{2K) = n, we can write 


n 

V_i_ 

^ + 1 


n £{1) kl{k) 

K + l ^ (A: + l)(A: + 2)' 

fc> 2 ,fcex ^ ' 
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( 70 ) 


Using n/{K + 1) < C and loose bounds for the other terms above, we obtain 

M{k) 




We shall show below that 


l{k) < min n. 


k>l.k^X 


k 


ARk^/^\ 


an 


■) 


for all A: G X 


(71) 


for a universal positive constant A. Before that, let us first prove (69) assuming (71). Assuming 
(71), we can write 

,2\ 2/5'I 


E E 

fc>l,/cGX fc>l,/cGX 


m 


an 

~AR 


2\ 2/5 


r ^ 

/ I 7.,/ 


k>l,kGX 


m 

k 


I{k> 


an 


AR J 


(72) 


In the first term on the right hand side above, we use the bound i{k) < ARk^/"^/{an). We then get 


^(fc) j 


k>l.k^X 


k 




Because X consists of integers of the form 2 ^, the sum in the right hand side above is bounded from 
above by a constant multiple of the last term. This gives 


m 

k 


Y: ^nk< 


^~^ar) 


AR 


an 


\ a J 


(73) 


k>l,k£X 

For the second term on the right hand side in (72), we use the bound i{k) < n which gives 


k>l^k£X 


m 

k 


Y ^i<k> 


an“\ 


2 \ 2/5' 


AR 


) r — ^ Y1 k Ik 


> 


k>l,k(^X 


an“\ 


2n 2/5' 

ar) 


Again, because X consists of integers of the form 2^, the sum in the right hand side above is bounded 
from above by a constant multiple of the first term. This gives 


k>l,k£X 


m 

k 


Y ^i(k> 


an' 


2 N 2/5' 


AR J 


< Cn 


an 

'ar 


2\ -2/5 


= c 


(¥) 


2/5 


(74) 


Inequalities (73) and (74) in conjunction with (70) proves (69) which would complete the proof of 
(28). 

We only need to prove (71). For this, observe first that when /c*(/) < k, Corollary 3.5 gives that 

(v^- 2 )ct 


Afc(0i) > 


(75) 


y/k + 1 

This is because if (75) is violated, then Corollary 3.5 gives k < k{i) < k^{i). Consequently, we have 

As.k{di)y/k + 1 


I{k*{i) < k} < 


( 76 - 2 ) 


a 
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and 


£{k) < 


(V6-2)a^ 

Now using the expression (36) for Afc(0j), it is easy to see that 


for every A: G X. 


(76) 


i=l 


1 

A: + 1 


k 




(77) 


where 5j is given by 


^ ^ / hx* {Oj + djTr/n) + hx* {Oj - 4j7r/n) 

i=i ^ ^ 


cos(4j7r/n) hK*{&i + 2j7r/n) + hK*{di — 2j7r/n) 
cos(2j7r/n) 2 


We will now prove an upper bound for Sj under the assumption that K* is contained in a ball of 
radius i? > 0. We may assume without loss of generality that this ball is centered at the origin 
because the expression for 6j above remains unchanged if hx* (O) is replaced by hx* (0) — oi cos 6 — 
02 sin0 for any (oi, 02 ) G Because 9i = 27ri/n — tt, we can rewrite 5j as 


^ / hK*{9i+2j) + hK*{9i-2j) 

i=i ^ ^ 


cos(4j7r/?T-) hK*{9i+j) + hK*{0i-j) 
cos(2j7r/n) 2 


Because 9 1 —)• hK*{9) is a periodic function of period 27r, the above expression only depends on 
hK*{9i)-, hK*{9n)- In fact, it is easy to see that 


6j = il- 


cos(2j7r/n) / 


Now because K* is contained in the ball of radius R centered at the origin, it follows that \hK* (^01 ^ 
R for each i which gives 


^ cos(477r/n)\ cos(4A7r/n)\ 

5i <nR{l - ) , : <nR{l - , , , { 

\ cos(2j7r/n) J \ cos(2A7r/n) J 


nR{l + 2 cos 2'Kk/n) 
cos 2'Kk/n 


(1 — cos 2Kk/n) 


for all 0 < J < A;. Because k < n/16 for all A; G X, it follows that 


a r?7r2^2 

5j < 8ni? sin^( ttA;/ n) < - for all 0 < j < A;. 

n 

The identity (77) therefore gives X]iLi^fc(^i) < 8RK‘^k‘^/n for all k £l. Consequently, from (76) 
and the trivial fact that £{k) < n, we obtain 


£{k) < min 


/ Stt^ Rk^y/k + 

V ’ {VQ - 2) o-n J 


for all A G X. 


Note that £(0) = 0 so that the above inequality only gives something useful for A > 1. Using 
A +1 < 2A for A > 1 and denoting the resulting constant by C, we obtain (71). This completes the 
proof of Theorem 3.6. 
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6.4 Proof of Theorem 3.7 


The following lemma will be crucially used in our proof of Theorem 3.7. For every compact, convex 
set P and i = 1,... ,n, let {i) denote the quantity with K* replaced by P. More precisely, 

fcf (i) := argmin (a^ (0*) + 

kei \ ^/k + lJ 

where A^{6i) is given by 

1 / hp{9i + Ajuln) + hp{0i — /n) cos(4j7r/n) hp{6i + 2j7r/n) + hp{6i — 2j'Kjn) 

k + 1 V 2 cos(2j7r/n) 2 

The next lemma states that for every i = 1,... ,n, the risk — hK*{0i))‘^ can be bounded 

from above by a combination of k^ (i) and how well K* can be approximated by P. This result 
holds for every P. The approximation of K* by P is measured in terms of the Hausdorff distance 
(defined in (31)). 

Lemma 6.1 (Approximation). There exists a universal positive constant C such that for every 
z = 1,... , n and every compact, convex set P, we have 

Ei^‘ - hK^{9,))^ < C • (78) 



Proof of Lemma 6.1. Fix i G {1,... ,n} and a compact, convex set P. For notational convenience, 
we write Afc,A((,A:* and k^ for Afc(0j), A(((0j), A:*(0j) and k^{9i) respectively. 

We assume that the following condition holds: 


+ 1 > + 1). (79) 

If this condition does not hold, we have 

1 24 (^ 2 - 1 ) 1 

fc* + 1 \/6 — 2 kf + 1 


and then (6.1) immediately follows from Theorem 3.1. 


Note that (79) implies, in particular, that k^ > k^. Inequality (89) in Lemma A.2 applied to 
k = k^ implies therefore that 

{VQ-2)^/WTla 


^kP > 


2{K + 1 ) 

Also inequality (88) applied to the set P instead of K* gives 



6(^2 - 1 ) 0 - 
+ 1 
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Combining the above pair of inequalities, we obtain 


^kP - ^ 


kP - 


(^-2)yfcfTTa 

2(fc, + 1) 


6(V2 - 1 ) 0 - 

\/ + 1 


The right hand above is non-decreasing in /cf + 1 and so we can replace A;;f’ -|- 1 by the lower bound 
in (79) to obtain, after some simplication. 


The key now is to observe that 


Afc - Af I < 2iH{K*,P) for all k. 


(80) 


(81) 


This follows from the definition (31) of the Hausdorff distance which gives 

cos(4j7r/n) 


Ak-Aj:\<eH{K*,P) 


('•raS 


; cos(2j7r/n) 


and this clearly implies (81) because cos(4j7r/n)/cos(2j7r/n) < 1 for all 0 < j < k. 
From (81) and (80), we deduce that 


iH{K*,P) > 


ca 

y/ k^: \ 


for a universal positive constant c. This, together with inequality (17), clearly implies (78) which 
completes the proof. □ 


We are now ready to prove Theorem 3.7. 


Proof of Theorem 3.7. We use inequality (68) from the proof of Theorem 3.6. This inequality, 
along with (78) for i = 1,... , n, gives 


1 2 / 2 ^ 

E^.i/ (k*, k) = - E E*-- ('1 - S C ^ E 


1 


+ iUK*,P) 


for every compact, convex set P. By restricting P to be in the class of polytopes, we get 

' a PP I '' 




+ el{K’,p) I. 


PeV \ n ^ k^ (i) -|- 1 
For the proof of (32), it is therefore enough to show that 


n ^ 

p ... -- < Cup log(en/up) for every P £V 


(82) 
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where vp denotes the number of extreme points of P and C is a universal positive constant. Fix 
a polytope P with vp = k. Let the extreme points oi P he zi,, Zk. Let Si,... ,Sk denote a 
partition of {6i,..., On} into k nonempty sets such that for each j = 1,..., m, we have 


hp{0i) = Zj{l) cos 6i + Zj{2) sin 9i for all 9i £ Sj 


where Zj = {zj(l), Zj{2)). For (82), it is enough to prove that 


E 


1 


< C'log(enj) for every j = 1,... ,k 


kP (i) + 1 

where rij is the cardinality of Sj. This is because we can write 


(83) 


n ^ k ^ k 

E jpprri = ^ .g WWTT - ^E>“S(-,) < Ckloaen/k) 

I —i J — i J — 1 


where we used the concavity of x e-)- log(ex). We prove (83) below. Fix I < j < k. The inequality 
is obvious if Sj is a singleton because k^(i) > 0. So suppose that rij = m > 2. Without loss of 
generality assume that Sj = {9u+i,..., 9u+m} where 0 < u < n — m. The definition of Sj implies 
that 

hp{9) = Zj (1) cos 9 + Zj{2) sin 9 for all 9 £ [9^+1,9u+m]- 
We can therefore apply inequality (23) to claim the existence of a positive constant c such that 


k^(i) > c n min (0j — 9u+i,9u+m — 9i) for all u + 1 < i < tt + m. 


The minimum with vr in (23) is redundant here because On+m — 9u+i < 27r. Because 9i = 2Txijn — n, 
we get 

k^(i) > 27rcmin {i — u — l,u + m — i) for all n + 1 < z < u + m. 

Therefore, there exists a universal constant C such that 


- Ill ^ riL - 

y TWX- <cy --r < CV-< Clog(em). 

■ oc + l ^ 1 + min(i — 1, m — i) ^ 

^—J- ^—J- 

This proves (83) thereby completing the proof of Theorem 3.7. 


□ 


6.5 Proof of Theorem 3.8 


Recall the dehnition (16) of the estimator K' and that of the interpolating function (15). Following 
an argument similar to that used at the beginning of the proof of Theorem 3.6, we observe that 


Ek‘ 


:L{K*,k') < f 

J — 


EK^[hK*{9)-h'{9)f d9 = ^ j Ek* (hK->{9) - h'{9)y d9 (84) 

1 = 1 
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Now fix 1 < z < n, < 0 < and let u{6) := E/f* — h'{9)^ . Using the expression (15) 

for h'{6), we get that 


u{6) = Kk* 



sin(6li+i -9) I 

sm{9i+i - 9i) 


sin(0 — 9i) 
sin(6'i+i - 9i) 



2 


We now write hi = hi — hK*{9i) + hK*{9i) and a similar expression for hj+i. The elementary 
inequality (a + 6 + c)^ < ‘i{a? + 1? + c^) along with max (sin(0 — 0j), sin(0j+i — 9)) < sin(0j+i — 9i) 
then imply that 

u{9) < 3 Ek* {hi — hK*{9i)^ + 3E/<* {hi+i — + 3b^{9) 


where 


9(9) := hK*{9) - ^'l, hK^{9,+^) 


sin(6»i+i - 9i) sin(6'i+i - 9i) 

Therefore from (84) (remember that |0i+i — = 27r/n), we deduce 


Ek^L{K*,K') {hi-hK*{9i)y + 

i=l 

/. \ 2 



b^i9)d9. 


Now to bound {hi — hK*{9i)j , we can simply use the arguments from the proofs of 

Theorems 3.6 and 3.7. Therefore, to complete the proof of Theorem 3.8, we only need to show that 


IK^)I < 


CR 


n 


for every 9 G (— vr, tt] 


(85) 


for some universal constant C. For this, we use the hypothesis that K* is contained in a ball 
of radius R. Suppose that the center of the ball is {xi,X 2 )- Define K' := K* — {(xi,X 2 )} := 
{{yi,y 2 ) — ixi,X 2 ) ■ (yi,y 2 ) G K*} and note that hK’{9) = hK*{9) — xicos0 — X2sin0. It is then 
easy to see that h{9) is the same for both K* and K'. It is therefore enough to prove (85) assuming 
that (xi,X 2 ) = (0,0). In this case, it is straightforward to see that \hK*{9)\ < R for all 9 and also 
that hx* is Lipschitz with constant R. Now, because max (sin(6* — 0j), sin(0j_|_i — 9)) < sin(0j_|_i—0j), 
it can be checked that 


\b{9)\<\hKm\ 


sin(6>i+i - 9) 

sin(6»j+i - 9i) 


sin(0 — 9i) 
sin(6'i+i - 9i) 


+ \hK*{9i) - hK*{9)\ + \hK*{9i+i) 


hK*{9)\. 


Because hx* is i?-Lipschitz and bounded by R, it is clear that we only need to show 


sin(0j+i — 9) sin(0 — 9i) ^ C 
sin(6'i+i - 9i) sin(6'i+i - 9i) ~ n 


in order to prove (85). For this, write a = 9i+i — 9 and j3 = 
becomes 


1 - 


sin a + sin (3 


sin(a + /3) 

This completes the proof of Theorem 3 


< |1 — cosa| + |1 — cos/3| < 


9 — 9i so that the above expression 

+ /32 C C 

- — < ^ < —• 

2 n 
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6.6 Proofs of Corollaries in Section 3.1 


The proofs of the corollaries stated in Section 3.1 are given here. For these proofs, we need some 
simple properties of the Ak{0i) which are stated and proved in Appendix A. 

We start with the proof of Corollary 3.5. 


Proof of Corollary 3.5. Fix 1 < z < n. We will prove that k[i) < k^:{i) < k{i). Inequality (27) 
would then follow from Theorem 3.1. For simplicity, we write for Afc(0j), for fk{di), 9k for 
9k{0i), for k*{i), k for k{i) and k for k{i). 


Inequality (89) in Lemma A.2 gives 

a(V6-2) 


for all A; > A:*, fc G X. 


y/k + 1 

Thus any A: G X for which < A^ < (t(\/6 — 2)ly/k + 1 has to satisfy k <k^,. This proves k < k^. 

For A:* < k, we first inequality (88) in Lemma A.2 to obtain A^^ > 6(\/2 — \)a/\Jk^ + 1. Also 
Lemma A.l states that k i—?■ A^ is non-decreasing for A: G X. We therefore have 

^ ^ 6(\/2 — 1)(T 6(y/2 — l)a . „ , , , „ 

9k < Ak < Afc. < — < - for all k <k^,k ^X. 

y/k^ + 1 V A: + 1 

Therefore any A: G X for which > 6(\/2 — \')a/y/k + 1 has to be larger than A:*. This proves 
k > k^. The proof is complete. □ 


We next give the proof of Corollary 3.3. 


Proof of Corollary 3.3. We only need to prove (20). Inequality (21) would then follow from The¬ 
orem 3.1. Fix i G {1,... ,n} and suppose that K* is contained in a ball of radius R centered at 
(xi,X 2 ). We shall prove below that Ak{0i) < 6TTRk/n for every A: G X and (20) would then follow 
from Corollary 3.5. Without loss of generality, assume that 6i = 0. 


As in the proof of Theorem 3.8, we may assume that K* is contained in the ball of radius R 
centered at the origin. This implies that \hK*{0)\ < R for all 9 and also that hx* is Lipschitz with 
constant R. Note then that for every A: G X and 0 < j < k, the quantity 

_ /tj^*(4j7r/n) hK*(-4j7r/n) _ cos(4j7r/n) hK*{2j7T/n) -F hK*{-2jTT/n) 

2 cos(2j7r/n) 2 


can be bounded as 


\Q\ = 


hK*{^j'^/n) - hK*{2j7T/n) hK*{-4:j7r/n) - hK*{-2j'K/n) 


cos(4j7r/n) — cos(2j7r/n)\ hK*{2jTT/n) -|-h/f*(—2j7r/n) 


cos(2j7r/n) 


< 


GRjTT 


n 


Here we used also the fact that cos(-) is Lipschitz and cos(2j7r/n) > 1/2. The inequality Afc(O) < 
QuRk/n then immediately follows. The proof is complete. □ 
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We conclude this section with a proof of Corollary 3.4. 


Proof of Corollary 3.4- By Theorem 3.1, inequality (24) is a direct consequence of (23). We there¬ 
fore only need to prove (23). Fix k € I with 

Tl 

k <- Oi). (86) 

It is then clear that 6i ± G [(^i(i), (/> 2 (i)] for every 0 < j < k. From (22), it follows that 

liK*(0) = xi cos 9 + X 2 sin 9 for all 9 = 9iziz -^,0 < j < k. 

n 

We now argue that Afc(0i) = 0. To see this, note first that Afc(0i) = Uk{0i) — Lk{9i) has the 
following alternative expression (36). Plugging in hK*{9) = xi cos 0-|-X 2 sin 0 in (36), one can see 
by direct computation that = 0 for every k G I satisfying (86). The definition (18) of k^{i) 

now immediately implies that 

k^{i) > min mm{9i - (p 2 {i) - 9i), cn^ 

for a small enough universal constant c. This proves (23) thereby completing the proof. □ 
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A Some additional technical results and proofs 

In this appendix, we provide additional technical results and proofs. 

Proof of Lemma 2.1. The inequality Lk* {0) < u{d, 4>) is obtained by using (1) with ai = 0+0, 0.2 = 
6 — (j) and a = 6. For l{6, 0) < hx* (0), we use (1) with ai = 0 + 20 ,02 = 0 and a = 0 + 0 to obtain 

hx* (0) > 2hx* (0 + 0) cos 0 — hx* (0 + 20). 

One similarly has hx*{0) > 2hx*{0 — 0)cos0 — hx*{0 — 20) and l{9,(f)) < hx*{9) is deduced by 
averaging these two inequalities. □ 

Lemma A.l. Recall the quantity Afc(0j) defined in (36). The inequality /S.2k{0i) A l-5Afc(0j) holds 
for every 1 <i <n and 0 < k < n/16. 

Proof. We may assume without loss of generality that 0j = 0. We will simply write A^ for Afc(0j) 
below for notational convenience. Let us define, for 0 £ M, 

„. hx* (20) + hx* (—20) cos 20 hx» (0) + hx* (—0) 

- 2 -- 2 -■ 

Note then that A^ = /'^)/{^ + !)• We shall first prove that 



tan X 


tan y 



for every 0 < y < 7r/4 and x < y < 2x. 


(87) 


For this, first apply (1) to ai = 2x, 02 = x and a = y to get 



sm x sm x 


We then apply (1) to ai = 2y, 02 = x and a = 2x to get (note that 2y — x <2y < tt/2) 



sm X sm X 


Combining these two inequalities, we get (note that 2y < 7r/2 which implies that cos2y > 0) 


hx*{2y) - -hx*{y) > ahx*{2x) - phx*{x), 

cos y 


where 


sin(2y — x) cos 2y sin(y — x) 


a := 


sin X cos y sin x 


and 



sin X cos y sin x 


It can be checked by a straightforward calculation that 


a =- and (5 =-. 

tan X tan x cos x 


tan y a a V 


a = 


tanx 
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It follows therefore that 


, , , cos 2?/, , , tan y 

hK*{2y) - -hK*{y) > 


cos y 


tan a: 


We similarly obtain 
Hr 




COS y 


tanx 


cos 2x 

hK*[2x) - nK*{x) 


cosx 


hK*{—2x) — ^ hR*{—x) 


cos X 


The required inequality (87) now results by adding the above two inequalities. A trivial consequence 
of (87) is that 5{y) > 6{x) for 0 < y < vr/d and x < y < 2x. Further, applying (87) to y = 2x 
(assuming that 0 < x < 7r/8), we obtain 5{2x) > 26{x). Note that tan 2a: = 2tanx/(l — tan^ x) > 
2tanx for 0 < X < 7r/8. 

To prove A 2 k > (l-5)Afc, we fix 1 < /c < n/16 (note that the inequality is trivial when k = 0) 
and note that 


A2k — 



1 /^ 2(2j-l)7r 

2A: + 1 ^ V V n 

i=i 


+ (5 



where we used the fact that <5(0) = 0. Using the bounds proved for 6{9), we have 




n 


n 


n / \ n 


Therefore 


A2k > 


2A: + 1 ^ V n 
a=i 




and this completes the proof. 


□ 


Lemma A.2. Fix i € {!,... ,ra}. Consider Afc(0j) (defined in (36)J and fe*(i) (defined in (18)j. 
We then have the following inequalities 


and 


Ak^(i){0i) < 


6(^2 - l)o- 
■sjKii) + 1 


( 88 ) 


AkiOi) > max 


(V6-2)o- (V6 j-2)VF+^\ 
y/k +1 2(/c* + 1) j 


for all k > kt:{i), k G I. 


(89) 


Proof. Fix i G {1,... ,n}. Below we simply denote A;*(i) and Afc(0j) by A:* and A^ respectively for 
notational convenience. 


We first prove (88). If k^ > 2, we have 


< A,./, + V 2 


2a 


y/ -\- 1. 


40 


























Using Lemma A.l (note that /c* G X and hence fc* < n/16), we have ^k ,/2 < (2/3)Afe,. We 
therefore have 


. 2cr 2 , r- 

+ ,, ^ + V2- 

V K* + 1 3 


2cj 


y/k^ 1 

which proves (88). Inequality (88) is trivial when A:* = 0. Finally, for /c* = 1, we have Ai + y/2a < 
Aq + 2(7 = 2(T which again implies (88). 


We now turn to (89). Let k' denote the smallest k £ I for which k > k^,. We start by proving 
the first part of (89): 

(V^- 2)o- 


Afc > 


for A: > fc*, A: G X. 


(90) 


y/k + l 

Note first that if (90) holds for k ^ A:^, then it holds for all k ^ k^ as well because A^ > Afc/ 
(from Lemma A.l) and l/\/k + 1 < 1/y/k' + 1. We therefore only need to verify (90) for k = k'. If 
k^ = 0, then A:' = 1 and because 

2(7 

Ai H— -j= > Ag + 2c7 — 2cr, 
v2 

we obtain Ai > (2 — yf2')a. This implies (90). On the other hand, if A:* > 0, then k' = 2A:* and we 
can write 

2a ^ 2a 2a 

A2fc* H-> Afc^ H-> 


\/2k^ + 1 


y/kif + 1 y/kif + 1 


This gives 


^2k, > 


2a 


2kjf + 1 


- 1 


y/2k^ + 1 y V A:* + 1 

which implies inequality (90) for k = 2k* because {2k* + 1)/(A:* + 1) > 3/2. The proof of (90) is 
complete. 

For the second part of (89), we use Lemma A.l which states A 2 k > (l-5)Afc > y/2Ak for all 
k £ I. By a repeated application of this inequality, we get 


- V - 


A: + 1 


Aui for all k > k'. 


Using (90) for k = k', we get 


Ak > 


k' +1 

{y/6 - 2)ay/kTT 

FTl 


The proof of (89) is now completed by observing that k' < 2k* + 1. 


□ 


Lemma A. 3. Fix i £ {l,...,n}. For every 0 < k < n/8, the variance of the random variable 
Uk{0i) (defined in (10)/ is at most a‘^/{k + 1). Also, for every 0 < A: < n/16, the variance of the 
random variable Ak{di) (defined in (11)/ is at most a"^/{k + 1). 


Proof. Fix 1 < i < n. 
0 < k < n/8. Note that 


We shall first prove the bound for the variance of Uk{0i) for a hxed 


Uk{0i) 


k + 1 


E 

7=0 


Yj+j + Yj-j 
2 cos{2jn/n) 
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It is therefore straightforward to see that 


\w{Uk[6i)) = 


a 


{k + lf 


1 + i^sec2(2j7r/n)j . 


For 1 < j < A: < ri/8, we have sec(2j7r/n) < \/2 because 2j7r/n < 7r/4. The inequality Yax{Uk{0i)) < 
(T^/(A: + 1) then immediately follows. 

Let us now turn to the variance of Ak{9i). When A: = 0, the conclusion is obvious since 
Ak{9i) = 0. Otherwise, the expression (11) for Ak{ 6 i) can be rewritten as 


where 




Afc(0.) = Si + S2 + S3 

cos(4j7r/n) Tj+j + Yi_j 


+1 ^ cos(2j7r/n) 2 


Sa = ^ {j is even} 

1=1 


1 - 


cos(4j7r/n) \ Yi+j + Yi_j 


and 


53 = ^^ Z] {i is even} 


2 k 


cos(2j7r/n) 

w + y,- 


j=k+l 

Si, S 2 and S 3 are clearly independent. Moreover, the different terms in each Si are also independent. 
Thus 


var(5i) = 


cr 


2(A: + 1)2^ 


{j is odd} 


cos^(4j7r/n) 
cos2(2}7r/re) ’ 


var(52) = 


a 


2(A: + 1)2^ 


Y {j is even} ( 1 - 


cos(4j7r/n) 
cos(2j7r/n) J ’ 


and 


var(53) = 


a 


2 k 


2(A; + 1)2 


Y {j is even} 


< 


a 


j=k-\-l 


2 {k + l) 


Now for k < n/16 and 1 < j < A:, 


cos(477r/n) 

0 <- . < 1 


cos(2j7r/n) 

which implies that var(5i) + var(52) < I2{k + 1). Thus var(Afe(0j)) < /{k + 1). 

The following lemma was used in the proof of Theorem 3.2. 

Lemma A.4. Let he the quantity (36) with 6i = 0 i.e., 

k 

1 / ^K* (4j7r/ n) + hx- (-4j7r/ n) cos(4j7r/n) hx- (2j7r/n) + hx* (-2j7r/ n) 

Afc := 2^ 


□ 


A:+ 1 


1=0 


cos(2j7r/n) 
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Then the following inequality holds for every k < n/16; 

/lx* (4/c7r/n) + hK*{—4:kTT/n) 


Afc < 


2 cos(4A;7r/n) 


- hK*iO). 


Proof. Prom Lemma A.l, it follows that 6 {2i7rfn) < 5{2k'n/n) for all 1 < i < /c (this follows 
by reapplying Lemma A.l to 211 :jn^^mjn^... until we hit 2kiT/n). As a consequence, we have 
Afc < 5{2ki:/n). Now, \i 9 = 2kir/n then 6 < vr/S and we can write 


m = 


hx* (20) + hx* (~20) cos 29 hx* (9) + hx* (~0) 


= cos 20 


2 cos 0 

hx*{‘2'9) + hx*{—‘29) 
2 cos 20 


— hx* (0) ) — cos 20 


hx* ( 0 ) + hx* (~ 0 ) 


2 cos 0 


— /ix*(0) 


Because hx*{9) + hx*{—9) > 2/ix*(O)cos0 and cos 20 > 0, we have 
(5(0) < cos 20 


/IX* (20) + /IX* (-2^ _ ^ /ix*(20) + /ix*(-20) _ 


2 cos 20 


2 cos 20 


The proof is complete. 


□ 
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